feat: add lada-cache:calibrate command for auto TTL calibration#153
feat: add lada-cache:calibrate command for auto TTL calibration#153acaliskol wants to merge 19 commits into
Conversation
Adds per-model cache TTL overrides without breaking the global
`expiration_time` default. Models can opt-in either via the new
`HasLadaTtl` interface (most flexible) or via a static config map.
Resolution order (first non-null wins):
1. Model implements `HasLadaTtl` → `$model->getLadaTtl()`
2. `config('lada-cache.model_ttls.<FQCN>')` static map
3. `config('lada-cache.expiration_time')` global default
Semantics of TTL values:
- `null` defer to fallback layer
- `> 0` TTL in seconds
- `0` persist forever (cache until tag invalidation) — matches
how global `expiration_time = 0` already behaves
- `< 0` same as 0 (forever) — discouraged
Why an interface and not a `$ladaTtl` public property:
- Traits cannot enforce property type across consumers; a public
property would collide with any model column named `ladaTtl`
- Octane-safe: no mutable shared state on a long-lived model instance
- PHPStan-narrowable: `instanceof HasLadaTtl` gives full type info,
whereas `property_exists($model, 'ladaTtl')` does not narrow
Files:
- src/Contracts/HasLadaTtl.php - new interface
- src/TtlResolver.php - new Octane-safe resolver service
- src/Cache.php - `set(..., ?int $ttl = null)` now accepts per-call TTL
- src/QueryHandler.php - resolves model TTL before each cache write
- src/LadaCacheServiceProvider.php - registers `lada.ttl_resolver` singleton
- config/lada-cache.php - new `model_ttls` config section
- tests/Integration/Cache/PerModelTtlTest.php - 9 tests covering resolver +
Cache::set TTL semantics (explicit / zero / null fallback)
Automatically derives per-model TTLs by sampling Redis OBJECT IDLETIME for
cached keys belonging to each Lada-cached model. Removes the guesswork of
manually tuning model_ttls config values.
Algorithm
=========
For each Eloquent model using LadaCacheTrait:
1. SCAN the tag sets (lada:tags:database:*:table_*:<table>) for cache keys.
2. SSCAN each tag set's members (avoids SMEMBERS blocking on huge sets).
3. Pipeline OBJECT IDLETIME per key (single round-trip per 100 keys).
4. Compute P50, P95, max of the idle-time distribution.
5. calibrated_ttl = max(ceil(P95 * safety_factor), floor(previous_ttl / 2))
The floor term is critical: OBJECT IDLETIME can only sample keys that have
not yet been evicted, so without it successive runs would monotonically
shrink TTL toward zero (survivor bias).
Resolution order in TtlResolver becomes:
1. HasLadaTtl interface
2. lada_cache_calibrations.calibrated_ttl <- new
3. config('lada-cache.model_ttls.<FQCN>')
4. global expiration_time
Calibration is gated by config('lada-cache.calibration.enabled') (default
false) and the resolver layer is decoupled from DB availability — repo
errors fall through to config rather than throwing.
Safety
======
- Refuses to run when Redis maxmemory-policy is *-lfu (IDLETIME unsupported).
- Dry-run by default; --apply is required to persist.
- Skips models with fewer than min_samples data points (including 0).
- --safety-factor <= 0 aborts with INVALID exit code.
- Defensive nullable constructor deps so the command stays instantiable
even when Lada is disabled (graceful SUCCESS path).
Performance
===========
SSCAN + pipelined OBJECT IDLETIME is ~50-100x faster than the naive
SMEMBERS + sequential OBJECT IDLETIME loop on large caches. Cursor-driven
iteration also keeps Redis non-blocking — safe to run in production.
The repository caches the full calibration map in-memory
(Cache::remember, config.cache_ttl seconds) so the hot resolution path
incurs no DB hit between calibration runs.
Files
=====
- src/Calibration/TtlCalibrationRepository.php (new)
- src/Console/CalibrateCommand.php (new)
- src/Redis.php (+scanKeys, +sScanMembers)
- src/TtlResolver.php (+calibration layer)
- src/LadaCacheServiceProvider.php (register repo/command/migrations)
- config/lada-cache.php (calibration block)
- database/migrations/2024_01_01_000001_create_lada_cache_calibrations_table.php
- tests/Unit/Calibration/TtlCalibrationRepositoryTest.php (7 tests)
- tests/Console/CalibrateCommandTest.php (12 tests)
- README.md (Auto-calibration section)
Depends on spiritix#152 (per-model TTL via HasLadaTtl interface + TtlResolver).
…manual run FAILURE
Two related guard bugs in disabled-mode common case:
1. Singleton resolver: 'command.lada-cache.calibrate' singleton only guarded
on 'lada-cache.active', so when calibration.enabled is false (default)
the resolver still called app->make('lada.redis'). Container resolution
fires during package:discover too — in Docker build / CI contexts
without Redis, this crashed the build with 'Connection refused'.
2. CalibrateCommand::handle() guard order: deps null check ran BEFORE
calibration.enabled flag check, so manual 'php artisan lada-cache:calibrate'
on default config (active=true + calibration.enabled=false) returned
Command::FAILURE + 'dependencies are not bound' error — should return
clean SUCCESS with disabled warning.
Fix: gate both on calibration.enabled before touching Redis.
|
Hi @acaliskol, same question as on your other pull request - with what use cases in mind did you develop this feature? Is that mainly to keep the cache size as small as possible? |
Add an opt-in LadaCacheActivity event dispatched on every cache hit / miss /
invalidate, plus a bundled StatsCounter listener that aggregates per-table
activity into hourly Redis HASH buckets:
lada:stats:YYYYMMDDHH
field "users:hit" → 42819
field "users:miss" → 512
field "users:invalidate" → 120
Disabled by default (`events.enabled` / `stats.enabled`) so unused installs
incur zero overhead on the query hot path.
The counter buffers in process memory and flushes when distinct keys exceed
the batch threshold, when the time interval elapses, or when the application
terminates — works under FPM, Octane, queue workers, and the scheduler.
Also wires StatsReader (single-pipeline aggregation across a lookback window)
so subsequent commits can use the buckets as a calibrate-time signal.
…t controller)
Enrich lada-cache:calibrate with per-table read / write counts from
StatsCounter and label each model by signal source:
- idletime_only: StatsReader unavailable or Redis lookup failed
- no_activity: reads + writes below min_reads_for_signal (cold table)
- write_heavy: invalidates / (hits+misses) >= write_heavy_ratio →
skip survivor-bias floor (writes invalidate anyway)
- read_heavy: default — convergent proportional control pulls TTL
toward target_hit_ratio by bounded steps
HitRatioAdjustment is a static, dependency-free implementation of the
controller:
deviation = target - hitRatio
adjustment = clamp(1 + learning_rate × deviation, 1±max_step)
ttl = max(1, ceil(raw × adjustment))
Stability invariants:
- Bounded per-run change (max_step) → no overshoot
- Hysteresis deadband around target → no oscillation
- max_step capped at 0.95 → adjustment never collapses to ≤ 0
- max(1, ...) final floor → TTL never reaches 0 ("persist forever")
null vs [] distinction in loadActivity() preserves "Redis down" as a
distinct signal from "cold tables", so monitoring can alert on outages.
The IDLETIME-only behavior is preserved as the fallback when stats are
unavailable, disabled, or below the min_reads threshold — installs that
opt out of events continue to work exactly as before.
`Application::terminating()` fires once at worker shutdown under Octane — not per-request as it does under FPM. Without a per-request hook the StatsCounter in-memory buffer could persist for minutes or hours and be lost on worker crash, OOM, or graceful restart. Add a `Laravel\Octane\Events\RequestHandled` listener guarded with `class_exists()` so the package does not require laravel/octane as a hard dependency. Both listeners are idempotent (empty buffer is a no-op) and the FPM terminating hook is retained as the correct primitive there. Comment updated to document the dual-hook strategy.
Four hot-path / robustness fixes:
* UTC bucket keys (`gmdate` instead of `date`) in StatsCounter::bucketKey()
and StatsReader::bucketKeysForLookback(). Writer and reader using the
server's local TZ would produce mismatched bucket names if they ran
on hosts with different `date.timezone` settings, silently dropping data.
* Bounded `$pending` buffer in StatsCounter. A sustained Redis outage
combined with the restore-on-failure path could grow the buffer
unboundedly and OOM the worker. Adds `$maxPendingSize` (default 10000):
excess entries are dropped (oldest first) and an error is logged so
operators can see the buffer is being shed.
* QueryHandler caches `lada-cache.events.enabled` at construction. The
flag is read once per worker (QueryHandler is a singleton) instead of
paying for a `config()` call on every cache hit / miss / invalidate.
* Drop unused imports from LadaCacheActivity. `CalibrateCommand` and
`StatsCounter` were imported only for docblock `{@see}` references;
switch the docblock to plain prose so the file no longer carries
coupling it does not use.
* Drop the `array` type annotation from `StatsReader::DEFAULT_ACTIONS`.
Typed class constants require PHP 8.3 and dropping it keeps the
package consumable on the older PHP versions some downstreams still
satisfy.
* `CalibrateCommand` previously passed `bool $statsAvailable` into
`adjustForActivity()`, collapsing the (null, [], array) tri-state from
`loadActivity()` and obscuring the 'idletime_only' vs 'no_activity'
distinction that monitoring relies on. Replace with an explicit
`$statsState: 'unavailable'|'empty'|'available'` derived once at the
top of `handle()` via a `match (true)` block. The label semantics
emitted to the signal counter are unchanged; the code is just easier
to follow.
* `warnIfLookbackExceedsBucketTtl()` no longer early-returns when
`StatsReader` is null. The warning is about config (lookback >
bucket retention) — operators should see it now so the next env that
flips stats on starts with a clean configuration.
* Annotate the dynamic `$client->{'exec'}()` call (Redis pipeline flush,
not OS exec) so future static-analyzer false positives don't trigger
unnecessary refactors.
…back > TTL
Four new edge-case tests covering the StatsCounter / StatsReader changes
in the preceding commits:
* `test_bucket_key_uses_utc_to_avoid_tz_drift` — regression guard for the
`date('YmdH')` → `gmdate('YmdH')` switch.
* `test_pipeline_failure_restores_pending_for_retry` — verifies the
swallow-then-restore behavior so a failed flush is retried, not lost.
* `test_overflow_drops_oldest_when_pending_exceeds_cap` — drives the
restore path with a tiny `maxPendingSize` to confirm oldest-first
eviction and that the buffer is bounded under sustained Redis outage.
* `test_read_treats_missing_old_buckets_as_zero_contribution` — proves
that `stats_lookback_hours` > `bucket_ttl_seconds/3600` is safe: the
expired buckets simply contribute zero rather than failing the read.
Existing tests that hardcoded `date('YmdH')` for the bucket name are
updated to `gmdate('YmdH')` so they stay correct in non-UTC test
environments now that the writer produces UTC keys.
The two final-mockability tests rely on a tiny helper that mocks
`Illuminate\Redis\Connections\Connection` (the injectable dep) and
wraps it in a real `Redis` proxy, since `Redis` itself is final readonly.
|
Addressed the review feedback across four atomic commits:
Three new tests cover: pipeline failure restoring $pending, overflow eviction, and `lookback_hours > bucket_retention` reading missing buckets as zero. Pre-existing test failures in |
…tion # Conflicts: # src/LadaCacheServiceProvider.php
|
Sorry I missed your actual question earlier — let me answer it directly. Use cases that motivated this PR We run Lada Cache across ~3M Redis keys in a Laravel app with ~80 cacheable models. Two pain points kept coming up:
On "is it mainly to keep cache size small?" Honestly, no — that's a byproduct, not the goal. The real target is the hit-ratio × invalidation-cost sweet spot per table. In production we saw:
Safety guards that make it cron-able (
Happy to scope down if any piece feels out of charter for the library — e.g. the StatsCounter half could live as a separate package and |
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
PHPStan annotations on PR diff were flagging __call magic methods (set, get, exists, sadd, del, unlink, multi, exec, pipeline, etc.) as undefined and mixed→int/string casts as unsafe. - Redis: 13 @method declarations covering all forwarded commands. - Cache::__construct: type-narrow config('lada-cache.expiration_time') via is_numeric before int cast. - Cache::flush: type-narrow config('database.redis.options.prefix') via is_string instead of (string)($x ?? ''). - Cache::repairTagMembership: @param array<string> $tags. Resolves 25 of 209 PHPStan max-level errors. Tests green (25/25).
Replaces `is_numeric ? (int) : 0` defensive narrowing with Laravel's typed config accessor Config::integer(), which returns a strict int and satisfies PHPStan level=max without ad-hoc type checks. Prefix lookup keeps the manual is_string narrow because Config::string() throws on the legacy `false` value still used by tests (TestCase::71) and some downstream configs — a small comment now documents this BC quirk.
For large model catalogs (200+) the previous per-model `repository->upsert()` issued one UPDATE/INSERT per row plus a cache bust each time. Schools with ~500 cached models would burn ~500 DB round-trips per nightly calibration. - TtlCalibrationRepository::upsertMany(array): bulk UPSERT in a single SQL statement with one cache bust at the end. - TtlCalibrationRepository::upsert(): now delegates to upsertMany. - CalibrateCommand: buffer --apply rows; flush every batch_size models (default 100, configurable via `lada-cache.calibration.batch_size`). Residual rows flushed after the loop. Tests: 21 passed (CalibrateCommandTest + TtlCalibrationRepositoryTest).
The CalibrateCommand reads this key with a hardcoded 100 fallback; adding it to the published config keeps it discoverable alongside the other calibration knobs (cache_ttl, schedule, min_samples, etc.).
a3b9947 to
0fb5dc2
Compare
Calibration now owns the activity collection path behind a single LADA_CACHE_CALIBRATION_ENABLED flag. The config introduces the feature first, moves model_ttls after the feature description, and removes separate public toggles for internal activity plumbing. The auto schedule now uses LADA_CACHE_CALIBRATION_SCHEDULE_INTERVAL as a day interval, so the public surface is "run every N days" instead of accepting cron syntax. CalibrateCommand uses typed dependency accessors so disabled command construction stays safe while PHPStan can prove the active path is non-null. Rejected: Keep separate event/stat enabled env vars | duplicates the calibration switch and exposes implementation plumbing Rejected: Keep LADA_CACHE_CALIBRATION_SCHEDULE as a cron expression | the requested public API is an N-day interval Rejected: Type-clean the rest of LadaCacheServiceProvider in this PR | unrelated uncovered lines hurt patch coverage Confidence: high Scope-risk: moderate Tested: vendor/bin/phpunit --configuration phpunit.xml --no-coverage Tested: vendor/bin/phpstan analyse --level=max src/Console/CalibrateCommand.php --memory-limit=1G Tested: vendor/bin/phpstan analyse --level=max src/ --memory-limit=1G (still 139 existing errors outside CalibrateCommand) Tested: vendor/bin/pint src/LadaCacheServiceProvider.php src/Database/SqliteConnection.php --format=txt Not-tested: Full src PHPStan is not green because of existing repo-wide errors outside CalibrateCommand
0fb5dc2 to
cbcb3b6
Compare
|
Tagging some contributors here to discuss this feature proposal. Let me know what you guys think! Is this the right approach in your opinion? I'd like to get the community more involved for directional changes like this. @kontainer-dam-pim @Tim-streamline @zgetro @duyphuongn @MGApcDev @michael-rubel @ogunsakin01@diegotibi |
Summary
Adds
php artisan lada-cache:calibrate, a non-blocking command thatauto-derives per-model TTLs from real Redis access patterns instead of
hand-tuning
config('lada-cache.model_ttls').For each Lada-cached model, the command samples Redis
OBJECT IDLETIMEacross the model's cache keys, computes the P95 idle time, and writes a
calibrated TTL via:
The
floor(previous_ttl / 2)term guards against survivor bias: OBJECTIDLETIME can only sample keys still alive, so without it successive runs
would monotonically shrink TTL toward zero.
Results land in
lada_cache_calibrations(package migration, publishedvia
--tag=migrations) and are consumed byTtlResolverbetween theHasLadaTtlinterface and the staticmodel_ttlsconfig map.TtlResolver chain (updated)
HasLadaTtl::getLadaTtl()(per-instance override)lada_cache_calibrations.calibrated_ttl← new (gated by config flag)config('lada-cache.model_ttls.<FQCN>')expiration_timeCalibration lookups go through
TtlCalibrationRepository, which cachesthe full map in-memory (
Cache::remember, configurable TTL) so the hotresolution path never hits DB between calibration runs.
Safety
maxmemory-policyis*-lfu(IDLETIMEunsupported under LFU — would calibrate every TTL toward zero).
--applyrequired to mutate the calibrations table.min_samplesdata points (including 0).--safety-factor <= 0aborts withINVALIDexit code.Lada is disabled (graceful SUCCESS path matching
FlushCommand).through to config rather than throwing — boot-time stays decoupled
from DB availability.
Performance
Redis on multi-million-member sets and bounds client memory.
OBJECT IDLETIMEin batches of 100 — one round-tripper batch instead of one per key (~50-100x faster on large caches).
drivers without a pipeline API.
Config
Recommended cron:
Files
src/Calibration/TtlCalibrationRepository.php(new)src/Console/CalibrateCommand.php(new)src/Redis.php— addsscanKeys(),sScanMembers()generatorssrc/TtlResolver.php— calibration layer (optional injection)src/LadaCacheServiceProvider.php— register repo, command, migrationsconfig/lada-cache.php—calibrationblock + updatedmodel_ttlsdocblockdatabase/migrations/…_create_lada_cache_calibrations_table.phptests/Unit/Calibration/TtlCalibrationRepositoryTest.php(7 tests)tests/Console/CalibrateCommandTest.php(12 tests)README.md— Auto-calibration section + new Console Command entryTest plan
php -lsyntax-check on every touched filecomposer install && vendor/bin/phpunit --filter=Calibratphp artisan vendor:publish --tag=migrationsphp artisan lada-cache:calibrate(dry-run) in a real app withHasLadaTtlmodelsDependency note
This PR depends on #152 (HasLadaTtl interface + TtlResolver). It is
branched from
feat/per-model-ttlto keep the diff minimal — once #152merges this PR will rebase to a clean diff against
master.